TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages

نویسندگان

  • Tiago C. Silva
  • Antonio Colaprico
  • Catharina Olsen
  • Fulvio D'Angelo
  • Gianluca Bontempi
  • Michele Ceccarelli
  • Houtan Noushmehr
  • Charlotte Soneson
  • Elena Papaleo
  • Tiago Chedraoui Silva
  • Kyle Ellrott
چکیده

Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no comprehensive tool that provides a complete integrative analysis harnessing the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative downstream analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data and by using Roadmap and ENCODE data, we provide a workplan to identify candidate biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors : low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAtoolbox, TCGAbiolinks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data

The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's rese...

متن کامل

recount workflow : Accessing over 70 , 000 human RNA - seq samples with Bioconductor

The recount2 resource is composed of over 70,000 uniformly processed human RNA-seq samples spanning TCGA and SRA, including GTEx. The processed data can be accessed via the recount2 website and the recount Bioconductor package. This workflow explains in detail how to use the recount package and how to integrate it with other Bioconductor packages for several analyses that can be carried out wi...

متن کامل

Using Free and Open-Source Bioconductor Packages to Analyze Array Comparative Genomics Hybridization (aCGH) Data

Whole-genome array Comparative Genomics Hybridization (aCGH) can be used to scan chromosomes for deletions and amplifications. Because of the increased accessibility of many commercial platforms, a lot of cancer researchers have used aCGH to study tumorigenesis or to predict clinical outcomes. Each data set is typically in several hundred thousands to one million rows of hybridization measureme...

متن کامل

TCGA - Assembler : Pipeline for TCGA Data Downloading , Assembling , and Processing ( Supplementary Methods )

The Cancer Genome Atlas (TCGA) is supported by the National Cancer Institute and the National Human Genome Research Institute to chart the molecular landscape of tumor samples for more than 20 types of cancer [1-3]. TCGA has been generating multi-modal genomics, epigenomics, and proteomics data for thousands of cancer patients, providing unprecedented opportunities for researchers to systematic...

متن کامل

recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor

The recount2 resource is composed of over 70,000 uniformly processed human RNA-seq samples spanning TCGA and SRA, including GTEx. The processed data can be accessed via the recount2 website and the recount Bioconductor package. This workflow explains in detail how to use the recount package and how to integrate it with other Bioconductor packages for several analyses that can be carried out wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2016